首页> 外文OA文献 >SCALE: A Scalable Framework for Efficiently Clustering Transactional Data
【2h】

SCALE: A Scalable Framework for Efficiently Clustering Transactional Data

机译:SCALE:可扩展的框架,可有效地将交易数据聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents SCALE, a fully automated transactional clustering framework. The SCALE design highlights three unique features. First, we introduce the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and it allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop the weighted coverage density measure based clustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Third, we introduce two clustering validation metrics and show that these domain specific clustering evaluation metrics are critical to capture the transactional semantics in clustering analysis. Our SCALE framework combines the weighted coverage density measure for clustering over a sample dataset with self-configuring methods. These self-configuring methods can automatically tune the two important parameters of our clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted extensive experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.
机译:本文介绍了SCALE,这是一种全自动的交易集群框架。 SCALE设计突出了三个独特的功能。首先,我们介绍加权覆盖密度的概念,作为有效地对交易数据集进行聚类的分类相似性度量。加权覆盖密度的概念是直观的,它允许根据项目的出现动态地更改群集中每个项目的权重。其次,我们开发了基于加权覆盖密度度量的聚类算法,这是一种用于分析事务数据的快速,内存高效且可扩展的聚类算法。第三,我们介绍了两个聚类验证指标,并表明这些领域特定的聚类评估指标对于在聚类分析中捕获事务语义至关重要。我们的SCALE框架结合了加权覆盖密度度量和自我配置方法,可对样本数据集进行聚类。这些自配置方法可以自动调整聚类算法的两个重要参数:(1)最佳K个聚类的候选者; (2)应用两个特定领域的聚类有效性度量,以从聚类结果集中找到最佳结果。我们已经使用合成数据集和真实数据集进行了广泛的实验评估,我们的结果表明,由SCALE框架提供支持的加权覆盖密度方法可以以全自动方式有效地生成高质量的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号